Unified Gibbs Method for Biological Sequence Analysis

نویسنده

  • Jun S. Liu
چکیده

The biotechnology revolution stems from rapid advances in the biological sciences. One important product of these advances is a large and rapidly growing data base of biopolymer (DNA, RNA, and protein) sequences, which has attracted much attention from researchers in diierent elds. The great majority of the techniques generated for studying these data have been designed to analyze a single sequence or for the comparison of a pair of sequences. Multiple sequence analysis has remained a diicult challenge. In recent years, formal statistical models have shown potential in one such problem, multiple sequence alignment. In this article we describe a general statistical paradigm, the uniied Gibbs method, for the conversion of nearly any existing method for the analysis of a single sequence or for the comparison of a pair of sequences into a multiple sequence analysis method. Our previous successful experiences with the uniied Gibbs include the development of the site sampler, the motif sampler, and the PROBE. Here we demonstrate again the power of such a paradigm by describing a multiple sequence partitioning method for the delineation of subsequences indicative of underlying structural features. We also show that the simple Bayesian framework is useful for model selections even for pairwise sequence comparisons.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

3 Bayesian Methods in Biological Sequence Analysis

Hidden Markov models, the expectation–maximization algorithm, and the Gibbs sampler were introduced for biological sequence analysis in early 1990s. Since then the use of formal statistical models and inference procedures has revolutionized the field of computational biology. This chapter reviews the hidden Markov and related models, as well as their Bayesian inference procedures and algorithms...

متن کامل

Bayesian Methods in Biological Sequence Analysis

Hidden Markov models, the expectation–maximization algorithm, and the Gibbs sampler were introduced for biological sequence analysis in early 1990s. Since then the use of formal statistical models and inference procedures has revolutionized the field of computational biology. This chapter reviews the hidden Markov and related models, as well as their Bayesian inference procedures and algorithms...

متن کامل

Finding Regulatory Elements Using Joint Likelihoods for Sequence and Expression Profile Data

A recent, popular method of finding promoter sequences is to look for conserved motifs upstream of genes clustered on the basis of expression data. This method presupposes that the clustering is correct. Theoretically, one should be better able to find promoter sequences and create more relevant gene clusters by taking a unified approach to these two problems. We present a likelihood function f...

متن کامل

Protein Folding: The Gibbs Free Energy

The fundamental law for protein folding is the Thermodynamic Principle: the amino acid sequence of a protein determines its native structure and the native structure has the minimum Gibbs free energy. If all chemical problems can be answered by quantum mechanics, there should be a quantum mechanics derivation of Gibbs free energy formula G(X) for every possible conformation X of the protein. We...

متن کامل

An improved Gibbs sampling method for motif discovery via sequence weighting.

The discovery of motifs in DNA sequences remains a fundamental and challenging problem in computational molecular biology and regulatory genomics, although a large number of computational methods have been proposed in the past decade. Among these methods, the Gibbs sampling strategy has shown great promise and is routinely used for finding regulatory motif elements in the promoter regions of co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007